45 research outputs found
Dual-Resonator Speed Meter for a Free Test Mass
A description and analysis are given of a ``speed meter'' for monitoring a
classical force that acts on a test mass. This speed meter is based on two
microwave resonators (``dual resonators''), one of which couples evanescently
to the position of the test mass. The sloshing of the resulting signal between
the resonators, and a wise choice of where to place the resonators' output
waveguide, produce a signal in the waveguide that (for sufficiently low
frequencies) is proportional to the test-mass velocity (speed) rather than its
position. This permits the speed meter to achieve force-measurement
sensitivities better than the standard quantum limit (SQL), both when operating
in a narrow-band mode and a wide-band mode. A scrutiny of experimental issues
shows that it is feasible, with current technology, to construct a
demonstration speed meter that beats the wide-band SQL by a factor 2. A concept
is sketched for an adaptation of this speed meter to optical frequencies; this
adaptation forms the basis for a possible LIGO-III interferometer that could
beat the gravitational-wave standard quantum limit h_SQL, but perhaps only by a
factor 1/xi = h_SQL/h ~ 3 (constrained by losses in the optics) and at the
price of a very high circulating optical power --- larger by 1/xi^2 than that
required to reach the SQL.Comment: RevTex: 13 pages with 4 embedded figures (two .eps format and two
drawn in TeX); Submitted to Physical Review
Machine Learning at Microsoft with ML .NET
Machine Learning is transitioning from an art and science into a technology
available to every developer. In the near future, every application on every
platform will incorporate trained models to encode data-based decisions that
would be impossible for developers to author. This presents a significant
engineering challenge, since currently data science and modeling are largely
decoupled from standard software development processes. This separation makes
incorporating machine learning capabilities inside applications unnecessarily
costly and difficult, and furthermore discourage developers from embracing ML
in first place. In this paper we present ML .NET, a framework developed at
Microsoft over the last decade in response to the challenge of making it easy
to ship machine learning models in large software applications. We present its
architecture, and illuminate the application demands that shaped it.
Specifically, we introduce DataView, the core data abstraction of ML .NET which
allows it to capture full predictive pipelines efficiently and consistently
across training and inference lifecycles. We close the paper with a
surprisingly favorable performance study of ML .NET compared to more recent
entrants, and a discussion of some lessons learned
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and non-equivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus- Copyright c # 2004, American Association for Artificial Intell
Learnable similarity functions and their applications to record linkage and clustering. Doctoral Dissertation Proposal
Many machine learning tasks require similarity functions that estimate likeness between observations. Similarity computations are particularly important for clustering and record linkage algorithms that depend on accurate estimates of the distance between datapoints. However, standard measures such as string edit distance and Euclidean distance often fail to capture an appropriate notion of similarity for a particular domain or dataset. This problem can be alleviated by employing learnable similarity functions that adapt using training data. In this proposal, we introduce two adaptive string similarity measures: (1) Learnable Edit Distance with Affine Gaps, and (2) Learnable Vector-Space Similarity Based on Pairwise Classification. These similarity functions can be trained using a corpus of labeled pairs of equivalent and non-equivalent strings. We illustrate the accuracy improvements obtained with these measures using MARLIN, our system for record linkage in databases that learns to combine adaptive and static string similarity functions in a two-level learning framework. Obtaining useful training examples for learnable similarity functions can be problematic due to scarcity of informative similar and dissimilar object pairs. We propose two strategies, Static-Active Selection and Weakly-Labeled Selection, that facilitate efficient training data collection for record linkage
Recommended from our members
Learnable similarity functions and their application to record linkage and clustering
textMany machine learning and data mining tasks depend on functions that estimate similarity
between instances. Similarity computations are particularly important in clustering and
information integration applications, where pairwise distances play a central role in many
algorithms. Typically, algorithms for these tasks rely on pre-defined similarity measures,
such as edit distance or cosine similarity for strings, or Euclidean distance for vector-space
data. However, standard distance functions are frequently suboptimal as they do not capture
the appropriate notion of similarity for a particular domain, dataset, or application.
In this thesis, we present several approaches for addressing this problem by employing
learnable similarity functions. Given supervision in the form of similar or disviii
similar pairs of instances, learnable similarity functions can be trained to provide accurate
estimates for the domain and task at hand. We study the problem of adapting similarity
functions in the context of several tasks: record linkage, clustering, and blocking. For each
of these tasks, we present learnable similarity functions and training algorithms that lead to
improved performance.
In record linkage, also known as duplicate detection and entity matching, the goal
is to identify database records referring to the same underlying entity. This requires estimating
similarity between corresponding field values of records, as well as overall similarity
between records. For computing field-level similarity between strings, we describe
two learnable variants of edit distance that lead to improvements in linkage accuracy. For
learning record-level similarity functions, we employ Support Vector Machines to combine
similarities of individual record fields in proportion to their relative importance, yielding
a high-accuracy linkage system. We also investigate strategies for efficient collection of
training data which can be scarce due to the pairwise nature of the record linkage task.
In clustering, similarity functions are essential as they determine the grouping of
instances that is the goal of clustering. We describe a framework for integrating learnable
similarity functions within a probabilistic model for semi-supervised clustering based on
Hidden Markov Random Fields (HMRFs). The framework accommodates learning various
distance measures, including those based on Bregman divergences (e.g., parameterized
Mahalanobis distance and parameterized KL-divergence), as well as directional measures
(e.g., cosine similarity). Thus, it is applicable to a wide range of domains and data representations.
Similarity functions are learned within the HMRF-KMEANS algorithm derived
from the framework, leading to significant improvements in clustering accuracy.
The third application we consider, blocking, is critical in making record linkage
and clustering algorithms scalable to large datasets, as it facilitates efficient selection of
approximately similar instance pairs without explicitly considering all possible pairs. Previously
proposed blocking methods require manually constructing a similarity function or
a set of similarity predicates, followed by hand-tuning of parameters. We propose learning
blocking functions automatically from linkage and semi-supervised clustering supervision,
which allows automatic construction of blocking methods that are efficient and accurate.
This approach yields computationally cheap learnable similarity functions that can be used
for scaling up in a variety of tasks that rely on pairwise distance computations, including
record linkage and clustering.Computer Science